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Abstract —  FDG-PET  images  of  patients  suffering  from 
Alzheimers  disease  (AD)  were  obtained  from  Paul  Scherre 
Institute,  Villingen,  Switzerland.  The  data  were  from  a 
CTI/Siemens  ECAT  933/04-16  scanner,  comprising  of  7 
image  slices  128  X  128  pixels.  The  study  included  48  Clin¬ 
ically  diagnosed  AD  patients  and  73  normal  controls.  Us¬ 
ing  an  invariant  feature  extraction  method  features  were 
extracted.  The  features  are  invariant  to  translation  and 
rotation  of  object(s)  within  the  image.  The  patients  are 
separated  into  two  groups  one  for  training  (24  AD  and  37 
normal  controls)  and  one  cross  validation  testing  (24  AD 
and  36  normal  controls).  Discriminant  function  analysis 
yielded  a  classification  accuracy  of  88%  sensitivity  and  86% 
specificity,  when  these  features  were  used. 
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I.  Introduction 

IN  this  paper  we  utilise  the  method  for  constructing 
invariant  features  for  gray  scale  images  proposed  by 
Schulz- Mirbach  [4].  The  method  produces  features  in¬ 
variant  to  rotation  and  translation,  but  not  invariant  to 
scaling,  thus  its  use  is  limited.  Essentials  concerning  the 
action  of  transformation  groups  on  gray  scale  images  and 
the  basic  concepts  for  calculating  invariant  gray  scale  fea¬ 
tures  by  evaluating  necessary  integrals  over  the  image  will 
be  introduced. 

II.  Theory  of  invariant  gray  scale  features 

Let  M  be  a  gray  scale  image,  where  M[x,y]  is  the  gray 
value  at  pixel  coordinates  (x,  y).  In  order  to  formulate  the 
theory  both  continuous  and  discrete  cases  are  considered. 
In  the  discrete  case  the  pixel  coordinates  (x,  y)  are  integers 
in  the  range  0  <  x  <  N^,  0  <  y  <  Ny  where  and  Ny 
are  the  dimensions  of  the  image.  In  the  continuous  case 
the  pixel  coordinates  can  be  real  numbers. 

Rotation  and  translation  will  be  described  by  the  action 
of  the  transformation  group  G  with  elements  g  £  G  on  the 
images.  So  for  an  image  M  and  a  group  element  g  £  G 
the  transformed  image  is  denoted  by  [7].  So  for  an 
image  translated  by  t  =  (G,  and  rotated  by  angle 

(f)  £  [0,  27r]  there  exists 

(5M)[x,  y]  =  ]VI[x,  y]  =  M[fe,  /]  with 

f  k  \  _  h  cos  4>  -sin(?!»  /  X  A  /  G  \ 

\  I  J  ~  \  sincf)  coscj)  )  \  y  )  ^  \  ty  ) 

All  indices  are  modulo  N.  Due  to  the  periodic  boundary 
conditions  the  range  of  the  components  of  the  translation 
vector  t  is  restricted  to  0  <  <  N^,  0  <  ty  <  Ny  which  is 

the  size  of  the  image.  In  the  discrete  formulation  pixel  co¬ 
ordinates  are  restricted  to  integers.  Since  vector  (fe,  l)'^  in 
equation  1  is  likely  to  have  non  integer  values,  appropriate 
rounding  or  interpolation  is  necessary. 


An  invariant  image  feature  is  a  function  F(M)  which  is 
invariant  to  the  action  of  the  transformation  group  on  the 
images  i.e 

F{gM)  =  F{M)  yg  £  G.  (2) 

So  feature  F  will  remain  constant  even  if  image  M  is 
transformed  by  g. 

The  transformation  law  (1)  states  that  “an  image  trans¬ 
formation  consists  of  a  rotation  around  the  rotation  centre 
followed  by  translation.  This  rotation  centre  is  not  known 
a  priori  and  it  does  not  necessarily  fall  inside  the  image. 
However,  by  applying  an  appropriate  translation  it  is  pos¬ 
sible  to  bring  the  coordinate  origin  to  the  rotation  centre. 
Since  we  are  seeking  features  which  are  invariant  both  to 
rotation  and  translation  the  position  of  the  rotation  centre 
does  not  matter.”  [6] 

A.  Constructing  invariant  features 

According  to  Schulz- Mirbach  [4],  [7]  it  is  possible  to 
construct  an  invariant  feature  F(M)  by  integrating  /(^M) 
over  the  transformation  group  G: 

F(M)  =  A[/](M)  =  [  f{gM)dg  (3) 

Jg 

where  A[f]  is  called  the  average  of  /.  This  averaging 
technique  is  described  in  greater  detail  in  [5] .  Since  we  are 
considering  the  group  of  image  rotations  and  translations 
with  cyclic  boundary  conditions,  the  integration  over  the 
transformation  group  can  be  written  as 

1 

mm  =  /  /  /  f{gM)d4>dGdty 

zirN^Ny  Jt:.=o  J<p=o 

(4) 

Therefore  if  the  function  /(M)  is  already  invariant  i.e. 
f(M)  =  f(gM)  it  remains  unaltered  by  the  group  aver¬ 
aging.  So  A[/](M)=/(M). 

Equation  4  can  be  implemented  by  a  two  step  strategy 
where  in  the  first  step  /  is  calculated  for  each  pixel,  and 
in  the  second  step  the  integral  of  all  these  results  is  com¬ 
puted.  In  the  discrete  domain  it  is  simply  the  sum  of  all 
the  results  obtained  by  evaluating  /.  Eigure  1  describes 
this  process  schematically. 

If  we  consider  the  monomial  example  /(M)  =  M[0,0], 
we  can  deduce  M[0, 0]  =  M[G,  ty]  from  equation  1.  Then 
the  group  average,  which  is  the  feature,  is  given  by 

A[f](M)  =  —  /  M[G,ty]dGdty  (5) 

Jty=0  Jt^=0 

In  this  case  the  result  is  simply  the  average  gray  value 
of  the  image. 
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III.  Experimentation 


Gray  Scale  Image  M 


Evaluation  of  a  local  function 
for  every  pixel  of  the  image  M 


1 

N,^ 


L 


Summation  over  the 
reslults  of  the  local 
computations 


Fig.  1.  Calculating  invariant  features 


If  we  consider  the  monomial  example  /(M)  = 

1VI[0, 0]1VI[5, 0],  again  we  can  deduce  M[0, 0]M[5, 0]  = 
cos(^)  +  tx,5sm{(f))  +  ty]  from  equation  1. 
Thus  the  feature  is  given  by 


^[/](]^)  —  27rJV„JV„  fty=0  ft^=0  l<i>=0  ty] 

M[5cos(^)  +  +  ty]d(f)dtxdty 


This  equation  can  be  described  by  the  two  step  strategy 
where  the  local  function  is 


r27r 

I  M[f3., cos(^)  +  fxi  5 sin(^)  +  ty]d(f>dtxdty  (7) 
J  <p=0 

Here  the  kernel  operates  at  a  neighbourhood  of  radius 
5  pixels.  We  then  sum  all  the  local  computations.  This 
process  if  explained  in  more  detail  in  section  III-B. 

B.  Monomial  properties 

The  method  described  not  only  allows  invariant  features 
which  are  invariant  with  respect  to  global  image  transfor¬ 
mation,  where  a  single  angle  and  translation  vector  de¬ 
scribe  the  transformation  of  the  image,  but  also  several 
local  transformations.  This  is  viable  if  there  is  only  mod¬ 
erate  overlap  between  the  local  transformation  regions, 
i.e.  as  long  as  the  object  separation  in  the  scene  is  greater 
than  the  kernel  size  of  the  monomials,  [6]. 

This  property  is  especially  beneficial  when  we  are  con¬ 
sidering  images  of  the  brain.  Certain  regions  of  the  brain 
are  affected  with  the  onset  of  AD,  however  the  position  of 
these  regions  varies  slightly  between  patients.  Typically 
one  would  register  the  images  to  determine  the  position 
of  these  regions  for  analysis.  By  utilising  the  position  in¬ 
variance  of  this  method  one  does  not  necessarily  need  to 
register  the  images.  The  effect  of  each  of  these  regions  (ob¬ 
jects)  will  impact  on  the  calculated  feature  irrespective  of 
their  position  within  the  brain. 

Furthermore,  Schulz-Mirbach  et  al  [6]  have  shown  that 
provided  objects  do  not  overlap,  the  invariant  features  are 
approximately  additive.  This  means  that  if  we  obtain  fea¬ 
tures  for  two  given  objects  and  then  in  a  scene  both  are 
present,  the  feature  value  here  will  be  approximately  the 
sum  of  the  independent  feature  values. 

The  effect  of  AD  on  each  region  of  the  brain  varies.  For 
a  given  patient  not  all  regions  are  affected.  The  additive 
nature  of  the  features  will  enable  the  features  to  have  the 
cumulative  effects  of  all  the  regions  that  exhibit  the  effects 
of  AD. 


A.  Data  aequisition 

FDG-PET  images  of  patients  suffering  from  AD  and 
normal  controls  were  obtained  from  the  Paul  Scherrer  In¬ 
stitute  (PSI),  Villingen,  Switzerland.  The  data  were  ac¬ 
quired  with  a  CTI  Siemens  EC  AT  933/04-16  scanner,  over 
a  period  of  2  years.  The  scanning  protocol  remained  the 
same  during  the  duration  of  this  study  to  eliminate  the 
chance  of  any  systematic  errors  being  introduced.  The 
data  supplied  had  been  reconstructed  using  filtered  back 
projection  and  consisted  of  the  first  16  frames  of  a  dy¬ 
namic  scan,  comprising  of  7  image  slices  taken  axially. 
The  slice  separation  was  approximately  8mm  resulting  in 
a  total  field  of  view  of  56mm  and  each  slice  was  128  x  128 
pixels. 

The  data  comprised  73  normal  controls  and  48  patients 
clinically  diagnosed  with  AD  using  the  criteria  of  the  Na¬ 
tional  Institute  of  Neurological  and  Communicative  Dis¬ 
orders  and  Stroke  and  Alzheimer’s  Disease  and  Related 
Disorder  Association  (NINCDS-  ADRDA)  [1].  For  this 
study  the  dynamic  nature  of  the  data  was  not  necessary 
thus  the  16  frames  for  each  subject  were  summed  to  en¬ 
force  the  signal. 

The  data  was  separated  into  two  groups  one  for  training 
purposes  and  one  to  be  used  for  testing.  For  the  training 
phase  there  were  37  normal  controls  and  24  AD  and  for 
the  testing  phase  there  were  36  normal  controls  and  24 
AD. 

The  background  of  the  image  influences  the  local  calcu¬ 
lations  of  /.  It  is  claimed  that  if  the  background  is  homo¬ 
geneous  then  the  impact  on  the  calculation  is  insignificant 
[6].  In  our  image  the  background  does  not  appear  to  be 
homogeneous,  therefore  it  becomes  necessary  to  extract 
the  object  from  the  background.  Thus  the  images  were 
manually  segmented  from  the  noisy  background.  Once 
this  was  done  the  invariant  features  were  calculated. 

B.  Monomials 

The  following  monomials,  which  were  found  to  perform 
well  by  Schael  et  al  [3]  were  used  for  this  study.  Although 
Schael  et  al  used  these  for  an  industrial  inspection  task  to 
recognise  defects  in  textures,  they  seem  to  perform  well 
in  the  task  at  hand  compared  to  other  arbitrarily  chosen 
monomials. 

/i(M)  =M[0,0]M[5,0] 

/2(M)  =M[3,0]M[0,8] 

/3(M)  =  M[0, 0]M[5, 0]M[0, 10] 

The  monomials  can  be  thought  of  as  producing  a  lo¬ 
cal  window  inside  which  all  calculations  are  performed. 
If  we  consider  the  /i  monomial,  this  will  produce  a  win¬ 
dow  of  radius  5  pixels  (see  figure  2).  In  the  window  the 
first  monomial  will  select  the  central  pixel,  this  will  be 
multiplied  by  a  pixel  at  distance  5  pixels  determined  by 
the  second  monomial.  This  is  repeated  for  angles  0  —  2tt 
and  the  results  summed.  This  is  essentially  the  local  part 
which  will  be  repeated  for  all  pixels  to  produce  the  feature. 

In  practice,  to  compute  /i  we  consider  a  pixel  and  first 
multiply  it  with  all  pixels  at  distance  5  from  it.  Then  we 


find  the  average  value  of  these  pairwise  multiplications. 
This  is  equivalent  to  integrating  over  all  rotation  angles. 
This  way  we  have  a  rotation  invariant  number  assigned  to 
each  pixel.  Finally  we  average  all  these  numbers  to  find  a 
single  translation  invariant  number  that  characterises  the 
whole  region. 


Fig.  2.  Monomial  /i-  Action  of  a  local  kernel 

In  order  to  compute  /2  we  consider  a  pixel,  at  distance 
3  pixels  from  the  centre  of  the  window.  This  is  multi¬ 
plied  with  pixels  a  distance  8  pixels  from  the  centre  of  the 
window.  The  two  pixels  are  90  degrees  to  each  other  with 
respect  to  the  centre  of  the  window.  The  average  of  this  is 
calculated  for  all  pairwise  multiplications  to  produce  the 
rotation  invariant  number  (see  figure  3).  Then  the  average 
of  all  these  local  windows  is  taken  to  produce  a  translation 
invariant  number,  which  characterises  the  whole  region. 
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Fig.  3.  Monomial  /2-  Action  of  a  local  kernel;  arrows  indicate  pixels 
that  are  multiplied  pairwise 

The  computation  of  fs  requires  a  much  larger  local  win¬ 
dow,  namely  one  of  10  pixels.  Here  as  in  feature  /i  the 
central  pixel  in  the  local  window  is  multiplied  by  a  pixel 
at  distance  5,  which  in  turn  is  multiplied  with  a  pixels  a 
distance  10  away.  The  pixel  form  the  vertices  of  a  right 
angle  triangle  (see  figure  4).  The  average  of  all  these  mul¬ 
tiplications  for  all  angles  produces  the  rotation  invariant 
number.  This  is  done  for  each  pixel  and  averaged  to  pro¬ 
duce  the  translation  invariant  number,  which  characterises 
the  whole  region. 

Applying  these  3  monomials  resulted  in  21  (7  slices  x 
3)  features  in  total  per  patient.  These  invariant  features 
were  used  to  perform  discriminant  function  analysis  on  the 
training  data  and  produce  discriminant  functions  using 
the  package  Statistica  [8].  Individual  features  were  tested 
as  well  as  combinations  of  many  features  to  gain  good 


Fig.  4.  Monomials  /a-  Action  of  a  local  kernel,  arrows  indicate 
pixels  that  are  multiplied  together 


classification  accuracy.  In  order  to  test  the  accuracy  the 
discriminant  function  obtained  from  this  training  phase 
was  used  to  classify  the  test  data. 

IV.  Results  and  Discussion 

Using  all  21  features  for  classification  yielded  a  classi¬ 
fication  accuracy  of  97%  during  the  training  phase.  The 
classification  accuracy  reduced  to  80%  on  the  test  data, 
(see  table  I). 

TABLE  I 

Classification  accuracy  using  all  21  invariant  features 


a.  Training 


True 

Predicted 

correct 

AD 

Normal 

AD 

24 

0 

100% 

Normcil 

2 

35 

95% 

Total 

97% 

b.  Testing 


True 

Predicted 

correct 

AD 

Normal 

AD 

17 

7 

71% 

Normal 

4 

33 

89% 

Total 

80% 

Although  the  training  classification  accuracy  was  ex¬ 
tremely  high,  the  resultant  testing  accuracy  was  not  as 
good.  This  can  be  explained  by  the  fact  that  using  so 
many  features  resulted  in  over  training  thus  the  discrim¬ 
inant  function  essentially  described  the  training  data  too 
well.  However,  the  testing  data  being  slightly  different 
were  not  classified  so  well. 

The  best  3  and  the  best  7  features  were  also  used  to  clas¬ 
sify  the  cases,  (see  tables  H  and  HI).  The  best  features 
were  selected  by  Statistica  using  a  measure  of  F  to  en¬ 
ter.  Here,  the  program  selects  for  inclusion  in  the  feature 
set,  the  feature  that  makes  the  most  significant  additional 
contribution  to  the  discrimination  between  groups;  that 
is,  the  program  chooses  the  variable  with  the  largest  F 
value.  The  F  value  for  a  variable  indicates  its  statistical 
significance  in  the  discrimination  between  groups  [8]. 

What  we  observe  here  is  that  reducing  the  number  of 
features  used  in  the  classification  reduces  the  classification 


TABLE  II 

Classification  accuracy  using  the  best  7  invariant  monomial 
FEATURES,  (/l(l), /3(1), /l(7), /2(1), /2(6), /l(3), /l(4))  (tHE 
NUMBER  IN  BRACKETS  CORRESPONDS  TO  IMAGE  SLICE  NUMBER) 


a.  Training 


True 

Predicted 

correct 

AD 

Normal 

AD 

23 

1 

96% 

Normal 

3 

34 

92% 

Total 

b.  Testing 


True 

Predicted 

correct 

AD 

Normal 

AD 

20 

4 

83% 

Normal 

7 

30 

81% 

Total 

82% 

TABLE  III 

Classification  accuracy  using  the  best  3  invariant  monomial 
FEATURES,  (/l  (1) , /s)!) , /l  (7))  (tHE  NUMBER  IN  BRACKETS 
CORRESPONDS  TO  IMAGE  SLICE  NUMBER) 


a.  Training 


True 

Predicted 

correct 

AD 

Normal 

AD 

21 

3 

Normal 

3 

34 

92% 

Total 

b.  Testing 


True 

Predicted 

correct 

AD 

Normal 

AD 

21 

3 

Normal 

5 

32 

86% 

Total 

87% 

tures  correspond  to  slices  1  and  7  which  depict  these  parts 
of  the  brain. 

Although  AD  affects  the  brain  globally,  various  regions 
are  more  affected  than  others.  The  position  of  these  re¬ 
gions  in  the  brain  vary  slightly  between  patients.  A  fully 
global  feature  extraction  method  will  not  take  into  account 
this.  However,  the  local  rotation  and  translation  invari¬ 
ance  of  the  monomial  method  would,  and  this  explains  its 
performance. 
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accuracy  of  the  training  data.  This  is  desirable  as  we  do 
not  really  want  to  over-fit  the  data.  As  a  result,  the  clas¬ 
sification  accuracy  on  the  testing  data,  which  is  the  most 
important  one,  improves  even  only  slightly.  The  classifica¬ 
tion  accuracy  between  training  and  testing  is  much  more 
similar  when  using  fewer  features.  In  addition,  notice  that 
using  many  features  biases  the  system  towards  a  false  neg¬ 
ative  thus  reducing  its  sensitivity,  which  is  very  important 
in  a  real  diagnostic  situation. 

Further  reduction  in  the  number  of  features  used  does 
not  improve  the  performance  but  in  fact  deteriorates  it. 
This  leads  to  the  conclusion  that  using  too  many  features 
over-fits  the  data,  so  the  predictive  ability  of  the  system 
for  new  data  is  somewhat  limited.  On  the  other  had  using 
too  few  features  does  not  describe  the  classes  adequately 
hence  its  performance  is  low  in  training  and  testing.  The 
best  performance  appears  to  be  when  combining  3  fea¬ 
tures.  These  features  are  the  values  of  the  monomials  /i 
and  /s  computed  for  slices  1  and  7. 

It  is  a  well  known  fact  that  AD  affects  the  hippo-campus 
and  cerebral  cortex  most  vigorously  [2].  The  best  3  fea- 


